Enhance approx_percentile with strict weight validation, improved error handling and IT coverage#17368
Conversation
There was a problem hiding this comment.
Pull request overview
This PR tightens correctness checks for the approx_percentile table aggregation, especially for the weighted (3-argument) form, by adding stricter argument type validation, runtime weight validation, and integration test coverage for the new behaviors.
Changes:
- Enforce that
approx_percentile’s percentage argument is a decimal type, and restrict weight (when provided) toINT32(orUNKNOWNforNULL). - Add runtime validation in
ApproxPercentileWithWeightAccumulatorfor invalid weights (e.g.,< 1) and handle null weights. - Extend table-model integration tests to cover null/invalid weight scenarios and unsupported weight types.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/metadata/TableMetadataImpl.java |
Strengthens analyzer-time type validation for approx_percentile percentage/weight arguments. |
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/aggregation/ApproxPercentileWithWeightAccumulator.java |
Adds execution-time weight validation/handling for the non-grouped weighted accumulator. |
integration-test/src/test/java/org/apache/iotdb/relational/it/query/recent/IoTDBTableAggregationIT.java |
Adds IT coverage for null weight result and invalid/unsupported weight failures. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (weightColumn.isNull(i)) { | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Null weights are currently silently skipped (continue), which contradicts the PR description that says weights are validated to be non-null. If the intended behavior is to reject NULL weights (at least when the value is non-null), this should throw a SemanticException instead of skipping; otherwise the PR description/tests should be aligned to explicitly define the semantics for NULL weights.
| @Override | ||
| public void addIntInput(Column[] arguments, AggregationMask mask) { | ||
| Column valueColumn = arguments[0]; | ||
| Column weightColumn = arguments[1]; | ||
|
|
||
| int positionCount = mask.getPositionCount(); | ||
|
|
||
| if (mask.isSelectAll()) { | ||
| for (int i = 0; i < valueColumn.getPositionCount(); i++) { | ||
| if (weightColumn.isNull(i)) { | ||
| continue; | ||
| } | ||
| if (weightColumn.getInt(i) < 1) { | ||
| throw new SemanticException("weight must be >= 1, was " + weightColumn.getInt(i)); | ||
| } | ||
| if (!valueColumn.isNull(i)) { | ||
| tDigest.add(valueColumn.getInt(i), weightColumn.getInt(i)); | ||
| } | ||
| } | ||
| } else { | ||
| int[] selectedPositions = mask.getSelectedPositions(); | ||
| int position; | ||
| for (int i = 0; i < positionCount; i++) { | ||
| position = selectedPositions[i]; | ||
| if (weightColumn.isNull(position)) { | ||
| continue; | ||
| } | ||
| if (weightColumn.getInt(position) < 1) { | ||
| throw new SemanticException("weight must be >= 1, was " + weightColumn.getInt(position)); | ||
| } | ||
| if (!valueColumn.isNull(position)) { | ||
| tDigest.add(valueColumn.getInt(position), weightColumn.getInt(position)); | ||
| } |
There was a problem hiding this comment.
Only the non-grouped accumulator was updated to validate/handle weights. Group-by queries use GroupedApproxPercentileWithWeightAccumulator (via AccumulatorFactory) which still calls weightColumn.getInt(...) without null/>=1 checks, so group-by approx_percentile(..., weight, ...) can still accept invalid weights or fail on NULL weights. The same validation logic should be applied to the grouped accumulator to keep behavior consistent.
| "select approx_percentile(s1,null,0.5) from table1", | ||
| new String[] {"_col0"}, | ||
| new String[] {"null,"}, | ||
| DATABASE_NAME); |
There was a problem hiding this comment.
The new NULL-weight test only covers the non-grouped execution path. Since group-by uses a different (grouped) accumulator, it would be good to add an IT that exercises approx_percentile(..., null, ...) (and an invalid weight like -1) under GROUP BY to ensure the grouped implementation matches the intended semantics and doesn’t regress.
| DATABASE_NAME); | |
| DATABASE_NAME); | |
| tableResultSetEqualTest( | |
| "select 1 as g, approx_percentile(s1,null,0.5) from table1 group by 1", | |
| new String[] {"g", "_col1"}, | |
| new String[] {"1,null,"}, | |
| DATABASE_NAME); |
| if (weightColumn.isNull(i)) { | ||
| continue; | ||
| } | ||
| if (weightColumn.getInt(i) < 1) { | ||
| throw new SemanticException("weight must be >= 1, was " + weightColumn.getInt(i)); | ||
| } | ||
| if (!valueColumn.isNull(i)) { | ||
| tDigest.add(valueColumn.getInt(i), weightColumn.getInt(i)); | ||
| } |
There was a problem hiding this comment.
Weight validation happens before checking whether the corresponding value is null. This means rows with NULL values can still throw on an invalid/negative weight even though that row would otherwise be ignored by the aggregation. Consider only reading/validating the weight inside the !valueColumn.isNull(...) branch (and then validating null/<=0 weight there).
…cumulator and handle null cases
This pull request enhances the robustness and correctness of the
approx_percentileaggregation function, particularly when using weighted percentiles. It introduces stricter type checks and runtime validation for the weight parameter, improves error handling, and adds corresponding integration tests to ensure the expected behavior.Validation and Error Handling Improvements:
ApproxPercentileWithWeightAccumulatorto ensure that weights are not null and are greater than or equal to 1; throws aSemanticExceptionif these conditions are not met. [1] [2] [3] [4] [5] [6] [7] [8]approx_percentilefunction inTableMetadataImplto ensure the percentage parameter is always decimal and the weight parameter (if present) is eitherINT32or unknown; throws clear semantic exceptions for unsupported types.Testing Enhancements:
IoTDBTableAggregationIT.javato verify correct handling of null weights and invalid weight values inapprox_percentile, including checks for proper error messages when weights are negative or of unsupported types. [1] [2]Dependency and Import Updates:
SemanticExceptioninApproxPercentileWithWeightAccumulator.javato support new exception handling logic.